Exact Pattern Matching for RNA Structure Ensembles
نویسندگان
چکیده
ExpaRNA’s core algorithm computes, for two fixed RNA structures, a maximal non-overlapping set of maximal exact matchings. We introduce an algorithm ExpaRNA-P that solves the lifted problem of finding such sets of exact matchings in entire Boltzmann-distributed structure ensembles of two RNAs. Due to a novel kind of structural sparsification, the new algorithm maintains the time and space complexity of the algorithm for fixed input structures. Furthermore, we generalized the chaining algorithm of ExpaRNA in order to compute a compatible subset of ExpaRNA-P’s exact matchings. We show that ExpaRNA-P outperforms ExpaRNA in BRAliBase 2.1 benchmarks, where we pass the chained exact matchings as anchor constraints to the RNA alignment tool LocARNA. Compared to LocARNA, this novel approach shows similar accuracy but is six times faster.
منابع مشابه
Pattern Matching and Local Alignment for RNA Structures
The primary structure of a ribonucleic acid (RNA) molecule can be represented as a sequence of nucleotides (bases) over the alphabet {A,C,G,U}. The secondary or tertiary structure of an RNA is a set of base pairs which form bonds between A − U and G − C. For secondary structures, these bonds have been traditionally assumed to be one-to-one and non-crossing. This paper considers pattern matching...
متن کاملAlgorithms for pattern matching and discovery in RNA secondary structure
Text-indexing structures provide significant advantages in the solution of many problems related to string analysis and comparison, and are nowadays widely used in the analysis of biological sequences. In this paper, we present some applications of affix trees to problems of exact and approximate pattern matching and discovery in RNA sequences. By allowing bidirectional search for symmetric pat...
متن کاملExact matching of RNA secondary structure patterns
Many RNA structures are assembled from a collection of RNA motifs, which appear repeatedly and in various combinations. Identification of RNA structural motifs will enhance our understanding of RNA structures and functions. Searching for secondary structural patterns in sequence databases is the basic technique and fundamental problem for extracting and identifying such motifs. A number of algo...
متن کاملPublications of Sebastian Will
[2] Sebastian Will 1 and Hosna Jabbari. Sparse RNA folding revisited: space-efficient minimum free energy structure prediction. quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Local exact pattern matching for non-fixed RNA structures. Structure-based whole genome realignment reveals many novel non-coding RNAs. CRISPRmap: an automated classification o...
متن کامل